Goto

Collaborating Authors

 imsart-generic ver


Variational bagging: a robust approach for Bayesian uncertainty quantification

Fan, Shitao, Ohn, Ilsang, Dunson, David, Lin, Lizhen

arXiv.org Machine Learning

Variational Bayes methods are popular due to their computational efficiency and adaptability to diverse applications. In specifying the variational family, mean-field classes are commonly used, which enables efficient algorithms such as coordinate ascent variational inference (CAVI) but fails to capture parameter dependence and typically underestimates uncertainty. In this work, we introduce a variational bagging approach that integrates a bagging procedure with variational Bayes, resulting in a bagged variational posterior for improved inference. We establish strong theoretical guarantees, including posterior contraction rates for general models and a Bernstein-von Mises (BVM) type theorem that ensures valid uncertainty quantification. Notably, our results show that even when using a mean-field variational family, our approach can recover off-diagonal elements of the limiting covariance structure and provide proper uncertainty quantification. In addition, variational bagging is robust to model misspecification, with covariance structures matching those of the target covariance. We illustrate our variational bagging method in numerical studies through applications to parametric models, finite mixture models, deep neural networks, and variational autoencoders (VAEs).


LASER: A new method for locally adaptive nonparametric regression

Chatterjee, Sabyasachi, Goswami, Subhajit, Mukherjee, Soumendu Sundar

arXiv.org Machine Learning

In this article, we introduce \textsf{LASER} (Locally Adaptive Smoothing Estimator for Regression), a computationally efficient locally adaptive nonparametric regression method that performs variable bandwidth local polynomial regression. We prove that it adapts (near-)optimally to the local H\"{o}lder exponent of the underlying regression function \texttt{simultaneously} at all points in its domain. Furthermore, we show that there is a single ideal choice of a global tuning parameter under which the above mentioned local adaptivity holds. Despite the vast literature on nonparametric regression, instances of practicable methods with provable guarantees of such a strong notion of local adaptivity are rare. The proposed method achieves excellent performance across a broad range of numerical experiments in comparison to popular alternative locally adaptive methods.


Convergence of Dirichlet Forms for MCMC Optimal Scaling with Dependent Target Distributions on Large Graphs

Ning, Ning

arXiv.org Machine Learning

Markov chain Monte Carlo (MCMC) algorithms have played a significant role in statistics, physics, machine learning and others, and they are the only known general and efficient approach for some high-dimensional problems. The random walk Metropolis (RWM) algorithm as the most classical MCMC algorithm, has had a great influence on the development and practice of science and engineering. The behavior of the RWM algorithm in high-dimensional problems is typically investigated through a weak convergence result of diffusion processes. In this paper, we utilize the Mosco convergence of Dirichlet forms in analyzing the RWM algorithm on large graphs, whose target distribution is the Gibbs measure that includes any probability measure satisfying a Markov property. The abstract and powerful theory of Dirichlet forms allows us to work directly and naturally on the infinite-dimensional space, and our notion of Mosco convergence allows Dirichlet forms associated with the RWM chains to lie on changing Hilbert spaces. Through the optimal scaling problem, we demonstrate the impressive strengths of the Dirichlet form approach over the standard diffusion approach.


Iterated Block Particle Filter for High-dimensional Parameter Learning: Beating the Curse of Dimensionality

Ning, Ning, Ionides, Edward L.

arXiv.org Machine Learning

Parameter learning for high-dimensional, partially observed, and nonlinear stochastic processes is a methodological challenge. Spatiotemporal disease transmission systems provide examples of such processes giving rise to open inference problems. We propose the iterated block particle filter (IBPF) algorithm for learning high-dimensional parameters over graphical state space models with general state spaces, measures, transition densities and graph structure. Theoretical performance guarantees are obtained on beating the curse of dimensionality (COD), algorithm convergence, and likelihood maximization. Experiments on a highly nonlinear and non-Gaussian spatiotemporal model for measles transmission reveal that the iterated ensemble Kalman filter algorithm (Li et al. (2020)) is ineffective and the iterated filtering algorithm (Ionides et al. (2015)) suffers from the COD, while our IBPF algorithm beats COD consistently across various experiments with different metrics.


Concentration bounds for the empirical angular measure with statistical learning applications

Clémençon, Stéphan, Jalalzai, Hamid, Sabourin, Anne, Segers, Johan

arXiv.org Machine Learning

The angular measure on the unit sphere characterizes the first-order dependence structure of the components of a random vector in extreme regions and is defined in terms of standardized margins. Its statistical recovery is an important step in learning problems involving observations far away from the center. In the common situation when the components of the vector have different distributions, the rank transformation offers a convenient and robust way of standardizing data in order to build an empirical version of the angular measure based on the most extreme observations. However, the study of the sampling distribution of the resulting empirical angular measure is challenging. It is the purpose of the paper to establish finite-sample bounds for the maximal deviations between the empirical and true angular measures, uniformly over classes of Borel sets of controlled combinatorial complexity. The bounds are valid with high probability and scale essentially as the square root of the effective sample size, up to a logarithmic factor. Discarding the most extreme observations yields a truncated version of the empirical angular measure for which the logarithmic factor in the concentration bound is replaced by a factor depending on the truncation level. The bounds are applied to provide performance guarantees for two statistical learning procedures tailored to extreme regions of the input space and built upon the empirical angular measure: binary classification in extreme regions through empirical risk minimization and unsupervised anomaly detection through minimum-volume sets of the sphere.


Conditions and Assumptions for Constraint-based Causal Structure Learning

Sadeghi, Kayvan, Soo, Terry

arXiv.org Machine Learning

The paper formalizes constraint-based structure learning of the "true" causal graph from observed data when unobserved variables are also existent. We define a "generic" structure learning algorithm, which provides conditions that, under the faithfulness assumption, the output of all known exact algorithms in the literature must satisfy, and which outputs graphs that are Markov equivalent to the causal graph. More importantly, we provide clear assumptions, weaker than faithfulness, under which the same generic algorithm outputs Markov equivalent graphs to the causal graph. We provide the theory for the general class of models under the assumption that the distribution is Markovian to the true causal graph, and we specialize the definitions and results for structural causal models.


Uniform Inference in High-Dimensional Generalized Additive Models

Bach, Philipp, Klaassen, Sven, Kueck, Jannis, Spindler, Martin

arXiv.org Machine Learning

We develop a method for uniform valid confidence bands of a nonparametric component $f_1$ in the general additive model $Y=f_1(X_1)+\ldots + f_p(X_p) + \varepsilon$ in a high-dimensional setting. We employ sieve estimation and embed it in a high-dimensional Z-estimation framework allowing us to construct uniformly valid confidence bands for the first component $f_1$. As usual in high-dimensional settings where the number of regressors $p$ may increase with sample, a sparsity assumption is critical for the analysis. We also run simulations studies which show that our proposed method gives reliable results concerning the estimation properties and coverage properties even in small samples. Finally, we illustrate our procedure with an empirical application demonstrating the implementation and the use of the proposed method in practice.


Oracle inequalities for image denoising with total variation regularization

Ortelli, Francesco, van de Geer, Sara

arXiv.org Machine Learning

We derive oracle results for discrete image denoising with a total variation penalty. We consider the least squares estimator with a penalty on the $\ell^1$-norm of the total discrete derivative of the image. This estimator falls into the class of analysis estimators. A bound on the effective sparsity by means of an interpolating matrix allows us to obtain oracle inequalities with fast rates. The bound is an extension of the bound by Ortelli and van de Geer [2019c] to the two-dimensional case. We also present an oracle inequality with slow rates, which matches, up to a log-term, the rate obtained for the same estimator by Mammen and van de Geer [1997]. The key ingredient for our results are the projection arguments to bound the empirical process due to Dalalyan et al. [2017].


Prediction bounds for (higher order) total variation regularized least squares

Ortelli, Francesco, van de Geer, Sara

arXiv.org Machine Learning

We establish oracle inequalities for the least squares estimator $\hat f$ with penalty on the total variation of $\hat f$ or on its higher order differences. Our main tool is an interpolating vector that leads to upper bounds for the effective sparsity. This allows one to show that the penalty on the $k^{\text{th}}$ order differences leads to an estimator $\hat f$ that can adapt to the number of jumps in the $(k-1)^{\text{th}}$ order differences. We present the details for $k=2, \ 3$ and expose a framework for deriving the result for general $k\in \mathbb{N}$.


Multivariate extensions of isotonic regression and total variation denoising via entire monotonicity and Hardy-Krause variation

Fang, Billy, Guntuboyina, Adityanand, Sen, Bodhisattva

arXiv.org Machine Learning

We consider the problem of nonparametric regression when the covariate is $d$-dimensional, where $d \geq 1$. In this paper we introduce and study two nonparametric least squares estimators (LSEs) in this setting---the entirely monotonic LSE and the constrained Hardy-Krause variation LSE. We show that these two LSEs are natural generalizations of univariate isotonic regression and univariate total variation denoising, respectively, to multiple dimensions. We discuss the characterization and computation of these two LSEs obtained from $n$ data points. We provide a detailed study of their risk properties under the squared error loss and fixed uniform lattice design. We show that the finite sample risk of these LSEs is always bounded from above by $n^{-2/3}$ modulo logarithmic factors depending on $d$; thus these nonparametric LSEs avoid the curse of dimensionality to some extent. For the case of the Hardy-Krause variation LSE, we also show that logarithmic factors which increase with $d$ are necessary in the risk upper bound by proving a minimax lower bound. Further, we illustrate that these LSEs are particularly useful in fitting rectangular piecewise constant functions. Specifically, we show that the risk of the entirely monotonic LSE is almost parametric (at most $1/n$ up to logarithmic factors) when the true function is well-approximable by a rectangular piecewise constant entirely monotone function with not too many constant pieces. A similar result is also shown to hold for the constrained Hardy-Krause variation LSE for a simple subclass of rectangular piecewise constant functions. We believe that the proposed LSEs yield a novel approach to estimating multivariate functions using convex optimization that avoid the curse of dimensionality to some extent.